{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to Forecast a Time Series with Python\n",
    "\n",
    "Wouldn't it be nice to know the future? This is the notebook that relates to the blog post on medium. Please check the blog for visualizations and explanations, this notebook is really just for the code :)\n",
    "\n",
    "\n",
    "## Processing the Data\n",
    "\n",
    "Let's explore the Industrial production of electric and gas utilities in the United States, from the years 1985-2018, with our frequency being Monthly production output.\n",
    "\n",
    "You can access this data here: https://fred.stlouisfed.org/series/IPG2211A2N\n",
    "\n",
    "This data measures the real output of all relevant establishments located in the United States, regardless of their ownership, but not those located in U.S. territories."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>IPG2211A2N</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1985-01-01</th>\n",
       "      <td>72.5052</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-02-01</th>\n",
       "      <td>70.6720</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-03-01</th>\n",
       "      <td>62.4502</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-04-01</th>\n",
       "      <td>57.4714</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-05-01</th>\n",
       "      <td>55.3151</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            IPG2211A2N\n",
       "DATE                  \n",
       "1985-01-01     72.5052\n",
       "1985-02-01     70.6720\n",
       "1985-03-01     62.4502\n",
       "1985-04-01     57.4714\n",
       "1985-05-01     55.3151"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import pandas as pd\n",
    "data = pd.read_csv(\"Electric_Production.csv\",index_col=0)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Right now our index is actually just a list of strings that look like a date, we'll want to adjust these to be timestamps, that way our forecasting analysis will be able to interpret these values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['1985-01-01', '1985-02-01', '1985-03-01', '1985-04-01', '1985-05-01',\n",
       "       '1985-06-01', '1985-07-01', '1985-08-01', '1985-09-01', '1985-10-01',\n",
       "       ...\n",
       "       '2017-04-01', '2017-05-01', '2017-06-01', '2017-07-01', '2017-08-01',\n",
       "       '2017-09-01', '2017-10-01', '2017-11-01', '2017-12-01', '2018-01-01'],\n",
       "      dtype='object', name='DATE', length=397)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data.index = pd.to_datetime(data.index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>IPG2211A2N</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1985-01-01</th>\n",
       "      <td>72.5052</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-02-01</th>\n",
       "      <td>70.6720</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-03-01</th>\n",
       "      <td>62.4502</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-04-01</th>\n",
       "      <td>57.4714</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-05-01</th>\n",
       "      <td>55.3151</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            IPG2211A2N\n",
       "DATE                  \n",
       "1985-01-01     72.5052\n",
       "1985-02-01     70.6720\n",
       "1985-03-01     62.4502\n",
       "1985-04-01     57.4714\n",
       "1985-05-01     55.3151"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatetimeIndex(['1985-01-01', '1985-02-01', '1985-03-01', '1985-04-01',\n",
       "               '1985-05-01', '1985-06-01', '1985-07-01', '1985-08-01',\n",
       "               '1985-09-01', '1985-10-01',\n",
       "               ...\n",
       "               '2017-04-01', '2017-05-01', '2017-06-01', '2017-07-01',\n",
       "               '2017-08-01', '2017-09-01', '2017-10-01', '2017-11-01',\n",
       "               '2017-12-01', '2018-01-01'],\n",
       "              dtype='datetime64[ns]', name='DATE', length=397, freq=None)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first make sure that the data doesn't have any missing data points:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>IPG2211A2N</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [IPG2211A2N]\n",
       "Index: []"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[pd.isnull(data['IPG2211A2N'])]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also rename this column since its hard to remember what \"IPG2211A2N\" code stands for:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data.columns = ['Energy Production']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Energy Production</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1985-01-01</th>\n",
       "      <td>72.5052</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-02-01</th>\n",
       "      <td>70.6720</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-03-01</th>\n",
       "      <td>62.4502</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-04-01</th>\n",
       "      <td>57.4714</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-05-01</th>\n",
       "      <td>55.3151</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Energy Production\n",
       "DATE                         \n",
       "1985-01-01            72.5052\n",
       "1985-02-01            70.6720\n",
       "1985-03-01            62.4502\n",
       "1985-04-01            57.4714\n",
       "1985-05-01            55.3151"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import plotly\n",
    "# plotly.tools.set_credentials_file()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'https://plot.ly/~Pierian-Data/12'"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from plotly.plotly import plot_mpl\n",
    "from statsmodels.tsa.seasonal import seasonal_decompose\n",
    "result = seasonal_decompose(data, model='multiplicative')\n",
    "fig = result.plot()\n",
    "plot_mpl(fig)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import plotly.plotly as ply\n",
    "import cufflinks as cf\n",
    "# Check the docs on setting up offline plotting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.iplot(title=\"Energy Production Jan 1985--Jan 2018\", theme='pearl')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\Marcial\\Anaconda3\\lib\\site-packages\\statsmodels\\compat\\pandas.py:56: FutureWarning:\n",
      "\n",
      "The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from pyramid.arima import auto_arima"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**he AIC measures how well a model fits the data while taking into account the overall complexity of the model. A model that fits the data very well while using lots of features will be assigned a larger AIC score than a model that uses fewer features to achieve the same goodness-of-fit. Therefore, we are interested in finding the model that yields the lowest AIC value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=1782.527, BIC=1802.447, Fit time=0.866 seconds\n",
      "Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 1, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds\n",
      "Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 0, 12); AIC=1942.040, BIC=1957.976, Fit time=0.238 seconds\n",
      "Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=1837.289, BIC=1853.224, Fit time=0.361 seconds\n",
      "Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 1, 12); AIC=1783.875, BIC=1807.778, Fit time=1.434 seconds\n",
      "Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 0, 12); AIC=1920.884, BIC=1936.820, Fit time=0.451 seconds\n",
      "Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 2, 12); AIC=1784.212, BIC=1808.116, Fit time=3.491 seconds\n",
      "Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 2, 12); AIC=1781.900, BIC=1809.788, Fit time=3.586 seconds\n",
      "Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 2, 12); AIC=1837.164, BIC=1861.067, Fit time=1.287 seconds\n",
      "Fit ARIMA: order=(2, 1, 1) seasonal_order=(1, 1, 2, 12); AIC=1782.648, BIC=1814.520, Fit time=4.348 seconds\n",
      "Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 2, 12); AIC=1852.587, BIC=1876.490, Fit time=1.008 seconds\n",
      "Fit ARIMA: order=(1, 1, 2) seasonal_order=(1, 1, 2, 12); AIC=1781.940, BIC=1813.811, Fit time=4.222 seconds\n",
      "Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 1, 2, 12); AIC=1864.184, BIC=1884.103, Fit time=1.035 seconds\n",
      "Fit ARIMA: order=(2, 1, 2) seasonal_order=(1, 1, 2, 12); AIC=1782.722, BIC=1818.578, Fit time=5.144 seconds\n",
      "Fit ARIMA: order=(1, 1, 1) seasonal_order=(2, 1, 2, 12); AIC=1772.539, BIC=1804.410, Fit time=4.551 seconds\n",
      "Fit ARIMA: order=(1, 1, 1) seasonal_order=(2, 1, 1, 12); AIC=1771.295, BIC=1799.182, Fit time=2.270 seconds\n",
      "Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 0, 12); AIC=1870.049, BIC=1889.969, Fit time=0.974 seconds\n",
      "Fit ARIMA: order=(0, 1, 1) seasonal_order=(2, 1, 1, 12); AIC=1825.210, BIC=1849.114, Fit time=0.999 seconds\n",
      "Fit ARIMA: order=(2, 1, 1) seasonal_order=(2, 1, 1, 12); AIC=1772.010, BIC=1803.881, Fit time=3.391 seconds\n",
      "Fit ARIMA: order=(1, 1, 0) seasonal_order=(2, 1, 1, 12); AIC=1842.551, BIC=1866.454, Fit time=0.808 seconds\n",
      "Fit ARIMA: order=(1, 1, 2) seasonal_order=(2, 1, 1, 12); AIC=1771.612, BIC=1803.484, Fit time=4.359 seconds\n",
      "Fit ARIMA: order=(0, 1, 0) seasonal_order=(2, 1, 1, 12); AIC=1855.606, BIC=1875.526, Fit time=0.641 seconds\n",
      "Fit ARIMA: order=(2, 1, 2) seasonal_order=(2, 1, 1, 12); AIC=1773.049, BIC=1808.904, Fit time=5.352 seconds\n",
      "Fit ARIMA: order=(1, 1, 1) seasonal_order=(2, 1, 0, 12); AIC=1813.388, BIC=1837.291, Fit time=3.069 seconds\n",
      "Total fit time: 53.889 seconds\n"
     ]
    }
   ],
   "source": [
    "stepwise_model = auto_arima(data, start_p=1, start_q=1,\n",
    "                           max_p=3, max_q=3, m=12,\n",
    "                           start_P=0, seasonal=True,\n",
    "                           d=1, D=1, trace=True,\n",
    "                           error_action='ignore',  \n",
    "                           suppress_warnings=True, \n",
    "                           stepwise=True) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1771.2948217037836"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stepwise_model.aic()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train Test Split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Energy Production</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1985-01-01</th>\n",
       "      <td>72.5052</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-02-01</th>\n",
       "      <td>70.6720</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-03-01</th>\n",
       "      <td>62.4502</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-04-01</th>\n",
       "      <td>57.4714</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1985-05-01</th>\n",
       "      <td>55.3151</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Energy Production\n",
       "DATE                         \n",
       "1985-01-01            72.5052\n",
       "1985-02-01            70.6720\n",
       "1985-03-01            62.4502\n",
       "1985-04-01            57.4714\n",
       "1985-05-01            55.3151"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "DatetimeIndex: 397 entries, 1985-01-01 to 2018-01-01\n",
      "Data columns (total 1 columns):\n",
      "Energy Production    397 non-null float64\n",
      "dtypes: float64(1)\n",
      "memory usage: 26.2 KB\n"
     ]
    }
   ],
   "source": [
    "data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll train on 20 years of data, from the years 1985-2015 and test our forcast on the years after that and compare it to the real data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "train = data.loc['1985-01-01':'2016-12-01']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Energy Production</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2016-08-01</th>\n",
       "      <td>115.5159</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-09-01</th>\n",
       "      <td>102.7637</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-01</th>\n",
       "      <td>91.4867</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-11-01</th>\n",
       "      <td>92.8900</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-12-01</th>\n",
       "      <td>112.7694</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Energy Production\n",
       "DATE                         \n",
       "2016-08-01           115.5159\n",
       "2016-09-01           102.7637\n",
       "2016-10-01            91.4867\n",
       "2016-11-01            92.8900\n",
       "2016-12-01           112.7694"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "test = data.loc['2015-01-01':]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Energy Production</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2015-01-01</th>\n",
       "      <td>120.2696</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-02-01</th>\n",
       "      <td>116.3788</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-03-01</th>\n",
       "      <td>104.4706</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-04-01</th>\n",
       "      <td>89.7461</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-05-01</th>\n",
       "      <td>91.0930</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Energy Production\n",
       "DATE                         \n",
       "2015-01-01           120.2696\n",
       "2015-02-01           116.3788\n",
       "2015-03-01           104.4706\n",
       "2015-04-01            89.7461\n",
       "2015-05-01            91.0930"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Energy Production</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2017-09-01</th>\n",
       "      <td>98.6154</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-10-01</th>\n",
       "      <td>93.6137</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-11-01</th>\n",
       "      <td>97.3359</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-12-01</th>\n",
       "      <td>114.7212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2018-01-01</th>\n",
       "      <td>129.4048</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Energy Production\n",
       "DATE                         \n",
       "2017-09-01            98.6154\n",
       "2017-10-01            93.6137\n",
       "2017-11-01            97.3359\n",
       "2017-12-01           114.7212\n",
       "2018-01-01           129.4048"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "37"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ARIMA(callback=None, disp=0, maxiter=50, method=None, order=(1, 1, 1),\n",
       "   out_of_sample_size=0, scoring='mse', scoring_args={},\n",
       "   seasonal_order=(2, 1, 1, 12), solver='lbfgs', start_params=None,\n",
       "   suppress_warnings=True, transparams=True, trend='c')"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stepwise_model.fit(train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "future_forecast = stepwise_model.predict(n_periods=37)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 121.0069841 ,  110.05829773,  100.66214128,   90.6704282 ,\n",
       "         92.16164339,  103.22814008,  112.50690673,  112.11184857,\n",
       "        101.04339558,   92.07696292,   95.82735302,  111.26301564,\n",
       "        120.21587944,  111.32815023,  102.16625906,   90.55576703,\n",
       "         92.13024541,  102.88613935,  111.8729164 ,  111.06722082,\n",
       "        100.84852305,   92.07136347,   95.82277211,  109.23003471,\n",
       "        119.35147539,  110.56205836,  100.99051798,   90.20673887,\n",
       "         91.75667578,  102.973546  ,  112.20965839,  111.68458911,\n",
       "        101.10323951,   91.83416937,   95.08657978,  109.42245514,\n",
       "        119.38303158])"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "future_forecast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "future_forecast = pd.DataFrame(future_forecast,index = test.index,columns=['Prediction'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Prediction</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2015-01-01</th>\n",
       "      <td>121.006984</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-02-01</th>\n",
       "      <td>110.058298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-03-01</th>\n",
       "      <td>100.662141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-04-01</th>\n",
       "      <td>90.670428</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-05-01</th>\n",
       "      <td>92.161643</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Prediction\n",
       "DATE                  \n",
       "2015-01-01  121.006984\n",
       "2015-02-01  110.058298\n",
       "2015-03-01  100.662141\n",
       "2015-04-01   90.670428\n",
       "2015-05-01   92.161643"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "future_forecast.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Energy Production</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DATE</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2015-01-01</th>\n",
       "      <td>120.2696</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-02-01</th>\n",
       "      <td>116.3788</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-03-01</th>\n",
       "      <td>104.4706</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-04-01</th>\n",
       "      <td>89.7461</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-05-01</th>\n",
       "      <td>91.0930</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Energy Production\n",
       "DATE                         \n",
       "2015-01-01           120.2696\n",
       "2015-02-01           116.3788\n",
       "2015-03-01           104.4706\n",
       "2015-04-01            89.7461\n",
       "2015-05-01            91.0930"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https://plot.ly/~Pierian-Data/16.embed\" height=\"525px\" width=\"100%\"></iframe>"
      ],
      "text/plain": [
       "<plotly.tools.PlotlyDisplay object>"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([test,future_forecast],axis=1).iplot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "future_forecast2 = future_forcast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https://plot.ly/~Pierian-Data/18.embed\" height=\"525px\" width=\"100%\"></iframe>"
      ],
      "text/plain": [
       "<plotly.tools.PlotlyDisplay object>"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([data,future_forecast2],axis=1).iplot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
